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Abstract. We have applied a Learning Vector Quantization (LVQ) algorithm to SDSS DR5 quasar 
spectra in order to create a large catalogue of broad absorption line quasars (BALQSOs). We first 
discuss the problems with BALQSO catalogues constructed using the conventional balnicity and/or 
absorption indices (BI and AI), and then describe the supervised LVQ network we have trained to 
recognise BALQSOs. The resulting BALQSO catalogue should be substantially more robust and 
complete than BI- or Al-based ones. 
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INTRODUCTION 

Broad absorption line quasars (BALQSOs) are a sub-class of active galactic nuclei 
(AGN) exhibiting strong, broad and blue-shifted spectroscopic absorption features (W, 
HI HI Hi) associated with strong winds of outflowing material reaching 0.1c-0.2c [q] . 
BALQSOs are predominantly radio-quiet ([6]), and there are subtle differences between 
their continuum and emission line properties and those of "normal" (non-BAL) QSOs 
([H|]). However, despite these differences, BALQSOs and non-BALQSOs appear to be 
drawn from the same parent population (OD). 

The most straightforward explanation for the differences between QSOs and BALQ- 
SOs is a simple orientation effect. Thus all QSOs may undergo significant mass loss 
through winds, but BALs are only observed if the central continuum and/or emission 
line source is viewed directly through the outflowing material. Viewed in this context, 
BALQSOs may be the only available tracers of a key physical process common to all 
AGN. Moreover, the fraction of QSOs displaying BAL features (/balqso) ma y provide 
a direct estimate of the opening angle of these outflows. 

Historically, most BALQSO samples were selected on the basis of the so-called bal- 
nicity index (BI; [|2|]) or similar metrics. These samples consistently yielded BALQSO 
fraction estimates in the range /balqso ~ 0. 10 — 0. 15 ([0, 0. 0, [sj]). In a previous paper 
flS; Paper I), we showed that the BI and a recently defined metric, the absorption index 
(AI; lllOn ). are to strict or relaxed respectively when selecting BALQSOs. 

Here we will use the hybrid-LVQ approach from Paper I using a combination of the 
classic BI, a simple neural network and visual inspection to produce BALQSO samples 
that are more robust than Al-based ones, but more complete than purely Bl-based ones 
using the QSO sample associated with Data Release 5 (DR5) of the Sloan Digital Sky 



FIGURE 1. Flow diagram illustrating the steps involved in our hybrid-LVQ classification method. 



Survey (SDSS; lllLllJ]). Our catalogue contains 3505 BALQSOs selected from 28,421 



objects in the SDSS DR5 QSO sample in the redshift range 1 .7 < z < 4.2. 

THE INPUT QSO SAMPLE 

The SDSS DR5 QSO catalogue contains over 77,000 objects in total [HI]. However, for 
the purpose of constructing a uniform BALQSO catalogue, we only consider objects 
whose spectra fully cover the C IV 1550 A resonance line, which displays a particularly 
deep and well-defined absorption through in the spectra of most BALQSOs. Given the 
wavelength range covered by the SDSS spectra, this implies an effective redshift window 
of 1 .7 < z < 4.2 for our QSO parent sample, which contains spectra of 28,421 objects. 

Our BALQSO classification method works on continuum-normalised spectra cover- 
ing the wavelength range 1401 - 1700 A with 1 A dispersion. It also uses the associated 
Bis for training the neural network and to flag borderline cases requiring visual inspec- 
tion. We therefore normalise all QSO spectra using the method described in JliQjfl, in 
which each spectrum is fit with a modified DR5 QSO composite allowing for object-to- 



object differences in reddening and overall spectral slope [14]. We then interpolate each 



spectrum onto the new wavelength grid and estimate the BI in the same way as 



HYBRID-LVQ SELECTION OF BALQSOS 

The method we use to classify BALQSOs has already been described in detail in |@], 
so we only provide an overview of the key points here. Briefly, our method is a hybrid 
of Bl-based, neural network and visual classifications. It is designed to produce a more 
complete BALQSO sample than a pure BI selection without significantly increasing the 



number of false positives. Starting with a Bl-based classification, we use a simple neural 
network-based machine learning algorithm called "learning vector quantization" (LVQ, 
[flSM ") to identify objects that might have been misclassified by the BI. All such objects 
are then inspected and classified visually. The way in which we train our LVQ network 
to recognize BALQSOs has been described in detail in Paper I. Note that redshift 
uncertainties are explicitly taken into account by our network. Below, we will sometimes 
refer to the full hybrid method as "LVQ-based", but it is always worth keeping in mind 
that LVQ is only one part of a process also involving the BI and visual inspection. 

PROPERTIES OF THE FINAL BALQSO CATALOGUE 

Fig. Q] shows a flow diagram of the steps involved in creating the final DR5 BALQSO 
catalogue using the hybrid-LVQ network. Our LVQ-based method classifies 3,385 of 
the 28,421 QSOs (11.91% =F 0.21%) in our DR5 parent sample as BALQSOs and the 
catalogue can be found onlineQ. It is reassuring to note that the LVQ classifications and 
the BI ones tend to agree for over 92% of the objects. The ones which the methods 
disagree on are visually inspected for their classifications. Overall we find that 29% of 
the BI > objects require visual inspection (848), whilst only 5% (1224) of the BI = 
do. At first one might think that this is quite a high fraction for the BI > objects. Fig. 
[2] shows composites in various AI and BI bins. The top-left panel in the figure shows a 
composite made from 1082 objects with 0>BI> 500 and 1 > AI > 500. For comparison 
a composite produced with BI = and AI = has been overplotted with the dashed line. 
No signs of absorption are present, showing that QSOs with a BI > are not necessarily 
BALs (for more examples see [9J]). The other panels in Fig. [2] show composites created 
for higher AIs and Bis. The composite containing the most QSOs is the middle-left, 
which could also be considered as being the region in AI-BI space including the most 
borderline cases. This again helps explain the high fraction of objects which the BI and 
LVQ disagree. 

In order to explore our results in more detail we present the composites in Fig. [3] 
The top row of panels show composites produced with QSOs which were finally tagged 
as BALQSOs, whilst the bottom row with non-BALQSO objects. As expected the top- 
centre composite displays a very narrow absorption, which will be missed by the BI 
calculation due to it's conservative definition. Next we consider the bottom-centre panel 
displaying the composite of objects with BI > but a non-BALQSO final tag. The 
spectrum is redder compared to the non-BALQSO composite, but has no clear signs 
of line absorption. 

CONCLUSIONS 

Compiling BAL quasar catalogues and determining the observed BALQSO fraction is 
a challenging task. Most of the problem resides in the ambiguity one encounters when 
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FIGURE 2. Composites in various AI-BI ranges (blue line), and composites created from AI = and 
BI = objects. Reddening has not been taken into account. 
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FIGURE 3. Various composites created in order to check our hybrid-LVQ method (blue line), and 
composites created from AI = and BI = objects. LVQ tag=l if LVQ classified the objects as BALQSOs, 
whilst Final tag==l is set to objects which made it into our final BALQSO catalogue. Reddening has not 
been taken into account. 



attempting to classify individual absorption features. After all, it is in the human nature 
to be subjective, even if one tries the opposite. The judgment will be based on previous 
examples of what is and what is not a BAL. This is why we have used a hybrid method 
consisting of a simplified neural network together with visual inspection to create a 
hopefully near-optimal BALQSO catalogue. 

In [0] and this work we have shed light on many classification problems when 
dealing with BALQSOs taken from contemporary surveys. We showed that when the 
recently introduced "absorption index" (AI) is used to classify BALQSOs, the resulting 
log AI distribution is clearly bimodal. Both modes containing comparable amount of 
objects, but only the high-AI mode clearly being associated with genuine BALQSOs. 
Moreover, in our previous paper, we showed how even the traditional "balnicity index" 
(BI) produces incomplete BALQSO samples. It is likely that due to the diverse nature of 
observed BAL throughs, conventional metrics are no longer appropriate given the large 



data volume increase in observed QSO samples. Also it seems even more unfeasible to 
define new metrics in order to deal with problems caused by the old ones. Here we have 
shown how a hybrid algorithm can overcome these problems, taking into account the 
increasing data volume gathered by contemporary astronomical surveys. 

The observed fraction is, however, still subject to serious selection effects. In [@] 
we have explored these and corrected for colour-, magnitude- and redshift-dependent 
selection biases on the DR3 dataset. After applying the corrections we reached the 
conclusion that there is no compelling evidence for redshift evolution in the intrinsic 
BALQSO fraction. 
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